On the Expressiveness of Information Extraction Patterns

نویسندگان

  • Mark A. Greenwood
  • Mark Stevenson
چکیده

Many recently reported machine learning approaches to the acquisition of information extraction (IE) patterns have used dependency trees as the basis for their pattern representations (Yangarber et al., 2000a; Yangarber, 2003; Sudo et al., 2003; Stevenson and Greenwood, 2005). While varying results have been reported for the resulting IE systems little has been reported about the ability of dependency trees, or patterns extracted from them, to represent relationships needed to perform IE. In this paper we evaluate the ability of a number of pattern representations, derived from dependency trees, to represent the relationships being extracted by an IE system. The paper concludes by suggesting the use of the “linked chains” model which represents around 94% of the possible relations in text but without generating an unwieldy number of candidate extraction patterns.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Data Extraction using Content-Based Handles

In this paper, we present an approach and a visual tool, called HWrap (Handle Based Wrapper), for creating web wrappers to extract data records from web pages. In our approach, we mainly rely on the visible page content to identify data regions on a web page. In our extraction algorithm, we inspired by the way a human user scans the page content for specific data. In particular, we use text fea...

متن کامل

Information Extraction Grammars

Formal grammars are extensively used to represent patterns in Information Extraction, but they do not permit the use of several types of features. Finite-state transducers, which are based on regular grammars, solve this issue, but they have other disadvantages such as the lack of expressiveness and the rigid matching priority. As an alternative, we propose Information Extraction Grammars. This...

متن کامل

Extraction of Drug Crime Patterns and Identifying People at Risk Using Data Mining Techniques

Introduction: In recent years, technology advancement and the growth of information technology in organizations have provided a huge source of data stored in the field of drug-related offenses. Analyzing these data and discovering hidden patterns in it can help detect and prevent the occurrence of crimes in this area. This paper aimed to identify the susceptible people to drug trafficking in Si...

متن کامل

Extraction of Drug Crime Patterns and Identifying People at Risk Using Data Mining Techniques

Introduction: In recent years, technology advancement and the growth of information technology in organizations have provided a huge source of data stored in the field of drug-related offenses. Analyzing these data and discovering hidden patterns in it can help detect and prevent the occurrence of crimes in this area. This paper aimed to identify the susceptible people to drug trafficking in Si...

متن کامل

Does Fundraising Have Meaningful Sequential Patterns? The Case of Fintech Startups

Nowadays, fundraising is one of the most important issues for both Fintech investors and startups. The pattern of fundraising in terms of “number and type of rounds and stages needed” are important. The diverse features and factors that could stem from Fintech business models which can influence success are of the key issues in shaping these patterns. This study applied the top 100 KPMG Fintech...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005